A Statistical Approach for Extracting Protein-Protein Interactions

نویسندگان

  • Lindsey Bell
  • Jinfeng Zhang
  • Xufeng Niu
چکیده

Automatic extraction of protein-protein interaction (PPI) information from scientific literature is important for building PPI databases, studying biological networks and discovering new biological knowledge through automatic hypothesis generation. In this paper, we present a new method for PPI extraction based on a mixture of logistic models. The method automatically clusters interaction words (words that describe the interactions of protein pairs) into groups with similar grammatical properties. Logistic models are fitted for each cluster of interaction words. Directionality of interactions is an essential piece of information for many protein interactions and important for building directed biological networks. Most of current PPI extraction methods do not extract the directional information of interactions. This is in part due to the lack of specific corpora with directionality information annotated. We introduce a new corpus, PICAD, for evaluating PPI extraction tools that includes directional annotation. In addition, we propose an ensemble approach using logistic regression, Bayesian Networks, and SVM for identifying PPIs. We show that using an ensemble of classifiers allows us to capture different features in the text and report an F-measure of 75.7% using our new corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Domains Mediating Protein Interactions

Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the in...

متن کامل

Extracting PPIs from MEDLINE using the HVS Model 1 Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State Model

Protein-protein interactions referring to the associations of protein molecules are crucial for many biological functions. A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature since most knowledge about them still hides in biomedical publications. We have constructed an information extraction syst...

متن کامل

A New Thermodynamic Approach for Protein Partitioning in Reverse Micellar Solution

Reverse micellar systems are nanofluids with unique properties that make them attractive in high selectivity separation processes, especially for biological compounds. Understanding the phase behavior and thermodynamic properties of these nanosystems is the first step in process design. Separation of components by these nanosystems is performed upon contact of aqueous and reverse micellar phase...

متن کامل

Biological Applications of Isothermal Titration Calorimetry

     Most of the biological phenomena are influenced by intermolecular recognition and interaction. Thus, understanding the thermodynamics of biomacromolecule ligand interaction is a very interesting area in biochemistry and biotechnology. One of the most powerful techniques to obtain precise information about the energetics of (bio) molecules binding to other biological macromolecules is isoth...

متن کامل

Rabies Infection: An Overview of Lyssavirus-Host Protein Interactions

Viruses are obligatory intracellular parasites that use cell proteins to take the control of the cell functions in order to accomplish their life cycle. Studying the viral-host interactions would increase our knowledge of the viral biology and mechanisms of pathogenesis. Studies on pathogenesis mechanisms of lyssaviruses, which are the causative agents of rabies, have revealed some important ho...

متن کامل

Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model

A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient abilit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011